Data Activator allow us to make an interesting workaround for infinity notebook executions.
Infinity execution loop is interesting for constant data ingestion, but Fabric doesn’t allow an infinity loop inside a notebook, it ends up timing out and failing.
Scheduling the notebook with very tight schedules is one possibility, but it’s not much trustworthy. You would need to know the precise run duration of the notebook and it’s not always the same. You end up with big intervals between the notebooks or with overlapping schedules. None of these options is good.
There is an interesting alternative option we can use. Here comes the workaround.
The Architecture of the workaround
The solution is to create a sign if the notebook is running or not. We can do this using a control table. “Control table” is a regular table in a lakehouse, but we will use with the purpose to control the execution flow of the notebook.
This table needs two fields: The date and time of the record insertion and an integer value which we will set to 0 or 1.
The notebook we want to run needs to make an insert at the beginning of the execution and at the end. At the beginning we insert a record with the value 1, to points the notebook is executing. At the end, we insert a record with value 0.
We can use Data Activator to monitor this table. Every time a record with value 0 is inserted, we execute the notebook again. In this way, Data Activator becomes the responsible for the infinity loop.
Ensuring the final Insert in the Table
It’s essential the final insert always executes at the end of the notebook, even if an error happens.
If this final insert is not executed, the loop breaks. In this way, we need to use a try/finally structure to ensure the final insert execution.
1 2 3 4 |
try: (main execution code) finally: (insert in control table) |
The main execution code could be long. The best way to organize this execution is to break down the codes in functions. The main code structure ends up being simple: only the try/finally exemplified above and calls to the functions which will make the actual work.
This makes the core code quite simple.
Data Activator delay and the Notebook code
Data Activator has a 5 minutes delay to execute a trigger. We need always to consider this delay.
You can extend to the maximum the intervals between the breaks by increasing the execution time of the notebook. I mean that the notebook itself can and should contain a loop of the execution.
The notebook loop can’t be infinity, but you can extend to the maximum possible without making the execution instable. Data Activator will ensure the notebook will be executed again after it finishes.
For example, I implemented scenarios where the notebook runs for 1 hour and 50 minutes. When it stops, it depends on data activator to be triggered again. In this way, every 1 hour and 50 minutes a 5 minutes interval can happen.
Configuring Data Activator
Data Activator requires one of the following to be configured:
- A real time ingestion EventStream
- A report visual
- A Kusto Query in a real time dashboard
Considering the scenario we have, with a lakehouse table, I consider the best option the Kusto Query in a real time dashboard.
I wrote before about how to use a Kusto Database and shortcuts to allow lakehouses to use real time dashboards and data activator.
The main query is very simple, reading the data from the control table. The real time dashboard requires the query to use a time range filter to allow us to set the alert.
On the alert, we configure it to execute each time the value is set to 0.
Analyzing the execution time
An additional feature we get from this method is the possibility to analyze the execution times.
- The time difference from a record with value 1 to the next record with value 0 tells us the execution duration of the notebook.
- The difference from a record with value 0 to the next record with value 1 tells us the interval between 1 execution to another.
Creating queries with these calculations we can build a dashboard to analyze the execution process.
Summary
This is a very clever and reliable system for a continuous infinity execution. The only problem is the data activator interval, which may not be enough for some scenarios.
Load comments